Female representation in government around the world is severely lacking. From what we found, there was no government within the last 5 years that averaged at least 50% women in their Parliament. There are a myriad of reasons as to why this occurred that could not be possibly summed up in a brief amount of time. However, we wanted to look at some of the byproducts of this, specifically: Does the percentage of women in a government’s parliament have any correlation with a government’s GDP?
For this study, we used two gapminder datasets, one containing GDP per capita adjusted for inflation in US dollars, and the other containing percentage of women in parliament. Both datasets contained these values across many countries and many years. To move forward with our observational study, we first pivoted and joined the two datasets into one complete dataset. From here we decided we wanted to narrow our dataset into just the biggest countries. To do this we filtered the dataset to only countries that had an average GDP of over $10000, leaving us a dataset with 52 countries across 24 years, 1248 total observations.
The first graph, shown above, we decided to produce was a simple scatter plot to observe the current relationship between these two quantitative variables. To do this we took all the countries in our dataset and averaged their GDP and percent women values from 2015-2019 to get one averaged datapoint for each country in our scatterplot.
| Term | Estimate | Standard Error | t-statistic | p-value |
|---|---|---|---|---|
| (Intercept) | 10701.4899 | 7675.6512 | 1.394213 | 0.169418331294586 |
| Proportion of Women in Government | 873.0195 | 271.3474 | 3.217350 | 0.00227176857313024 |
Our estimated intercept is 10701.489, which means that for a country with 0 women in parliament, we can expect them to have a GDP of $10,701.489. However, it is important to note that the p-value for intercept is not statistically significant, which means that we cannot be confident that the intercept is not zero. This likely indicates a linear model with a high standard error. The estimated slope is 873.02, which means that for every increase in percentage of women in parliament, we can expect a country’s average GDP from the years 2015-2019 to increase by $873.02. The p-value for the slope is statistically significant, which means that there is a definite relationship between proportion of women in government and average gdp.
| Variance of Response Values | Variance of Fitted Values | Variance of Residual Values |
|---|---|---|
| 436260320 | 74826480 | 361433841 |
Part of assessing the quality of our regression model includes examining how much variability in the response values were accounted for by our model. The table above shows the total variance of the response values, the variance of the fitted values from our regression model, and the variance of the residual values from our regression model. The variance of fitted values describes the amount of variability in the response accounted for by the explanatory variable, and the variance of residual values describes the amount of variability unaccounted for. Based on this table, it is evident that the majority of the variance in the response variables was in the residual values, and therefore was not accounted for by our model.
The proportion of the variability in the response values that is accounted for by our regression model can be calculated by taking the variance of fitted values and dividing by the variance of response values. This value is also known as R-squared, or the coefficient of determination, and it is equal to 0.171518 for our model. This is a moderately low R-squared value, and it indicates that our model accounts for about 17% of the variation in GDP, which is not very strong but not terrible. Additionally, our model has a high standard error of residuals of $19,200.59. This means that every time we predict GDP using our model, we can expect to be off by an average of roughly nineteen thousand US dollars.
The R-squared value and other statistics about the model fit can be found on the table below:
| R^2 | Adjusted R^2 | Residual Standard Error | F-Statistic | p-value | df | df.residual | Number of Observations |
|---|---|---|---|---|---|---|---|
| 0.171518 | 0.1549483 | 19200.59 | 10.35134 | 0.0022718 | 1 | 50 | 52 |
For this plot we wanted to examine the relationship between our two variables throughout time. To accomplish this we created an animated scatterplot, with each frame being one year. We see a consistent trend, both GDP and percentage of women generally seem to increase as time goes on. Despite some sporadic year to year changes, this positive trend seems to be pretty consistent across all countries over a long period of time.
In order to get a better understanding of the results we got, we simulated values based on the linear regression from above. We first created a noise function that randomly generates a value based on the regression line and adds an error to it based on the variance we had. Then we generated a simulation for every country in our dataset and compared it to our observed mean gdp. After this, we simulated 1000 simulations of this in order to look at and analyze the resulting R-squared values.
Our first attempt to examine the difference between our actual vs predicted GDP values was to look at the two values compared to percent women in side by side scatter plots. Despite a similar general positive trend, the two plots show many differences between each other. The observed values seem to be much more dense around $20000 to $25000 GDP with some more spread out outlier observations in the higher GDP values around $110000. Conversely the simulated values are much more even across GDP values, with the highest values only reaching around $69000. Another pretty noticeable difference between the two plots is that some of the simulated GDP values were negative due to the nature of the regression line, something that obviously is not possible in actual GDP values.
The scatterplot above shows the predicted GDP on the x-axis and the observed average GDP on the y-axis. The predicted GDP is generated from simulating random values based around the linear regression line. This scatterplot visualizes data we would expect to come from our linear model and compares it to what we actually observed . A model that perfectly fits the regression line would mean x equals y for every observation.
## Mean R-squared = 0.04521055
As you can see from the Histogram, most of the simulated values of R-squared are concentrated around 1, with the graph being heavily skewed right. This suggests not much correlation. The mean R-squared value across all 1000 simulations is around .045. This is a relatively low R-squared value, meaning there is not a strong relationship between women in government and average GDP from the years 2015-2019. In other words, around .045 of the variation in a country’s average GDP from 2015-2019 can be explained by the amount of women they have in government. From this small R-squared value we can say there is not enough evidence of a correlation to reject the notion that there isn’t a correlation.
From the small R-squared and the simulated values being far less extreme than the observed values, unfortunately we did not find any evidence of a correlation between the GDP of a country and the percentage of women in their government. Ideally we would have found a strong correlation that says having more women in parliament is beneficial, especially since there isn’t any country that averages more than 50%, but there is still a spin here. If the results show that there is no evidence that the proportion of women in government affects the GDP of a country, then having more female representation does not have a drawback and should be encouraged.